Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Reintroduce nnz and nonzeros #6963

Merged
merged 1 commit into from
May 27, 2014
Merged

RFC: Reintroduce nnz and nonzeros #6963

merged 1 commit into from
May 27, 2014

Conversation

ViralBShah
Copy link
Member

Reintroduce nnz and nonzeros as discussed in #6769
Deprecate nfilled
Deprecate nonzeros for StridedArray and BitArray
find()/findnz() over sparse matrices no longer filters out stored zeros

Should sparse() allow stored zeros in its input, using an optional argument? Although this discussion started with nnz, I feel it is leading towards better clarity on treatment of stored zeros, and overall I am feeling good about that.

@ViralBShah ViralBShah added this to the 0.3 milestone May 25, 2014
@ViralBShah ViralBShah self-assigned this May 25, 2014
@ViralBShah ViralBShah changed the title Reintroduce nnz and nonzeros RFC: Reintroduce nnz and nonzeros May 25, 2014
return (I, J, V)
end

nonzeros!(S::SparseMatrixCSC) = S.nzval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a convention to use ! for return types? Sorry that I ask but I just was not aware of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - it is not.

@tknopp
Copy link
Contributor

tknopp commented May 25, 2014

I am still a little surprised that nonzeros returns a copy. I can understand the reasoning behind it (the sparse object should not become corrupted). But is this a common pattern? This kind of cries for something like an read-only array.

@ViralBShah
Copy link
Member Author

This would certainly be the expected behaviour. For access to the underlying data structures of SparseMatrixCSC, we should probably have accessor methods. We will do that early in 0.4. For now, people already directly pick up S.nzval.

@mlubin
Copy link
Member

mlubin commented May 26, 2014

Returning a copy might be the expected behavior, but it also makes nonzeros essentially useless for the vast majority of cases that I'm aware of. The name nonzeros! seems like an abuse of notation; it doesn't make any modifications to its argument.

I don't really see any benefit that these methods bring. If you're using nonzeros!, you already know enough about the data structure to go ahead and use S.nzval. If you don't know the data structure, then the list of nonzero entries isn't too useful since you don't know what indices they correspond to, and findnz would be used instead. This plus the huge naming conflict leads me to say that we should just get rid of nonzeros altogether.

@JeffBezanson
Copy link
Sponsor Member

The ! should only be used if the argument is modified.

@ViralBShah
Copy link
Member Author

@mlubin I have used nonzeros in this manner quite a bit when I used to work with MATLAB. I think the name is not confusing, and the behaviour is not a surprise to those who expect it. nonzeros! is already gone.

Just out of curiosity, @eldadHaber - Can your use case of nonzeros be replaced by findnz? The only downside of findnz is that it would allocate memory for indexes, which you may not want.

Cc: @tkelman

@@ -75,4 +75,10 @@ Sparse matrices support much of the same set of operations as dense matrices. Th

Return the symmetric permutation of A, which is ``A[p,p]``. A should be symmetric and sparse, where only the upper triangular part of the matrix is stored. This algorithm ignores the lower triangular part of the matrix. Only the upper triangular part of the result is returned as well.

.. function:: nonzeros!(A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got left over

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Removed it.

@mlubin
Copy link
Member

mlubin commented May 26, 2014

@ViralBShah, what's a typical use case where one would want to make a copy of nzval?

@tkelman
Copy link
Contributor

tkelman commented May 26, 2014

find()/findnz() over sparse matrices no longer filters out stored zeros

Not sure that's a good idea, you're just moving the inconsistent treatment of zeros from nonzeros into find. I think find should be left alone, if a function for converting CSC to COO leaving stored zeros intact is desired I think it should be a separate function (or conversion if we add a full-fledged COO type).

A construction flag in sparse to allow explicit zeros sounds reasonable.

At this point I don't see a whole lot of use for nonzeros(A) relative to A.nzval, except I guess for the sake of naming familiarity from Matlab where you don't typically have direct access to the underlying CSC data.

@ViralBShah
Copy link
Member Author

A.nzval is not recommended. We really need to have an api for that, or else we will be stick with the internal structure.

@tknopp
Copy link
Contributor

tknopp commented May 26, 2014

I am with @mlubin and @tkelman here. At this point nonzeros is just a convenience function for matlab users and funnily the Matlab function should be even more efficient as it has COW semantics.

Of course we have similar issues with all the temporaries that are created in the arithmetic vector operations but unlike + where one really has no chance to do anything but create a temporary array, here the array is already allocated.

So from the user perspective "the right thing" to do is currently to use the fields of SparseMatrixCSC. This is ok if we decide that fields are part of an interface (see #4935 and #1974). But reading through both issues I don't see that there is already a conclusion about this. If we decide that only functions are the public interface and field overloading is not coming, then we would need accessor methods like e.g. nonzeros.

Maybe we should introduce a convention that functions that accessor functions that may corrupt the state of an object should be named nonzeros💀 ;-)

But joking aside I think that we need decisions on #1974 as this will affect the way Base will involve during the journey to 1.0

To this PR, I still like @mlubin 's (#5538) naming convention better as it is more explicite and circumvents the uncertainty that we have on structural and actual non-zeros. And I still do not get the pressing reason to make Matlab compatibility so important (other than 1 user beeing unhappy). But as we have talked about that already in #6769 I think it is time to just make a decision for 0.3.

@tkelman
Copy link
Contributor

tkelman commented May 26, 2014

What's the concern with field access, name stability? Wanting to apply the same code to other sparse formats with potentially different field names?

I doubt I'd use an API that's dramatically slower than field access when both are equally convenient. Having a non-copy field accessor function could be worth using, the concern about modification there is more or less the same as with slices, transposes, etc potentially returning views.

@tknopp
Copy link
Contributor

tknopp commented May 26, 2014

@tkelman. Yes its about name stability and "good practice". It is always good to separate the interface from the implementation. Using functions for interfaces is whats currently done in Base (e.g. size, length) and performance should not be an issue when using inlining. But again looking at #4935 and #1974 there is not yet a clear conclusion whether this is the route to follow in the future. Field overloading would break that paradigm and make fields part of an interface. Note that my comment is about a non-copying accessor.

It is true that one could argue that the interface of SparseMatrixCSC is nothing that will be reused so that information hiding is probably not that important here. Still I think it would be good to have conventions that we follow even in cases where it is probably not so important.

@lindahua
Copy link
Contributor

I understanding some of the reasoning behind the copy. However, knowing that nzval will be copied in nonzeros would make me circumvent this function altogether and go directly to nzval in practice.

I would suggest nonzeros returns the vector of the non zero values itself (or a view thereof), so that they can be accessed without being copied or directly manipulated. It should also be clearly documented and we should warn the user about potential consequence of modifying the elements there.

@ViralBShah
Copy link
Member Author

One choice is to leave nonzeros as is and have a separate api that returns all the internal vectors without making copies.

@stevengj
Copy link
Member

I agree with @lindahua; nonzeros should just return the data, not a copy.

@ViralBShah
Copy link
Member Author

nonzeros no longer returns a copy.

@mlubin
Copy link
Member

mlubin commented May 26, 2014

Cool, now on to the next controversial issue :)
I'd agree with @tkelman that find and findnz should remove the explicitly stored zeros. If a user is interacting with a sparse matrix at this higher level, I don't think the concept of structural nonzeros needs to be introduced (except maybe as a keyword argument?)

@ViralBShah
Copy link
Member Author

I wasn't sure about that one, but wanted to try it out. I think it is best to remove it and avoid the cognitive overload.

@ViralBShah
Copy link
Member Author

Alright, what else do you want? :-)

@ViralBShah
Copy link
Member Author

Good point - done.


Return a vector of the structural nonzero values in sparse matrix ``A``. This includes zeros that are explicitly stored in the sparse matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe indicate here that the vector has mutating access to the original values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

@mlubin
Copy link
Member

mlubin commented May 26, 2014

I'm satisfied on all of the substantial issues here. I think we're just missing deprecations from the 0.2 version of nnz that applied to general vectors:

@deprecate nnz(v::AbstractVector) countnz

(assuming that works correctly). Also NEWS has a mention of nfilled that should be updated.

@ViralBShah
Copy link
Member Author

Why only AbstractVector? What about AbstractArray?

@ViralBShah
Copy link
Member Author

I mean we should do it for StridedArray.

@mlubin
Copy link
Member

mlubin commented May 26, 2014

That's probably right.

@@ -320,7 +320,7 @@ Deprecated or removed

* `factorize!` is deprecated in favor of `factorize`. ([#5526])

* `nnz` is removed. Use `countnz` or `nfilled` instead ([#5538])
* `nnz` counts the number of structural nonzeros in a sparse matrix. Use `countnz` for the actual number of nonzeros. ([#6769])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a link farther down for the issue [#5538]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the link as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more accurate to replace counts with returns:

nnz returns the number of structural nonzeros ...

This method is not doing the counting.

Deprecate nfilled for SparseMatrixCSC.
Deprecate nonzeros for StridedArray and BitArray.
Deprecate nnz for StridedArray.
Update documentation for structural nonzeros.
Update NEWS
@mlubin
Copy link
Member

mlubin commented May 26, 2014

Anyone have remaining strong objections?

@tknopp
Copy link
Contributor

tknopp commented May 26, 2014

LGTM. Thanks @ViralBShah for taking all objections into account!

@lindahua
Copy link
Contributor

@ViralBShah I think it is great. Thanks.

ViralBShah added a commit that referenced this pull request May 27, 2014
RFC: Reintroduce nnz and nonzeros
@ViralBShah ViralBShah merged commit 8879b88 into master May 27, 2014
@ViralBShah ViralBShah deleted the vs/nnz branch May 27, 2014 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants